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IV 


ABSTRACT 


A  new  data  assimilation  system  has  been  designed  and  implemented  at  the 
National  Center  for  Aeronautie  Meteorology  and  Climatology  of  the  Italian  Air  Foree 
(CNMCA)  in  order  to  improve  its  numerieal  weather  predietion  eapabilities  and  provide 
more  aeeurate  guidanee  to  operational  foreeasters.  The  system,  whieh  is  undergoing 
testing  before  eventual  operational  use,  is  based  on  an  “observation  spaee”  version  of  the 
3D-Var  method  for  the  objeetive  analysis  component,  and  on  the  High  Resolution 
Regional  Model  (H.R.M)  of  CNMCA  for  the  prognostic  component.  New  features  of  the 
system  include  completely  rewritten  correlation  functions  in  spherical  geometry, 
derivation  of  the  objective  analysis  parameters  from  a  statistical  analysis  of  the 
innovation  increments,  introduction  of  an  anisotropic  component  in  the  correlation 
functions,  solution  of  analysis  equations  by  a  conjugate  gradient  descent  method.  The 
analysis  and  forecast  fields  derived  from  the  assimilation  system  are  subjectively  and 
statistically  evaluated  through  comparisons  with  parallel  runs  based  on  European  Centre 
for  Medium  Range  Weather  Forecast  (ECMWF):  preliminary  results  of  these  studies  are 
also  presented. 
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I.  INTRODUCTION 


Already  in  1911  Bjerknes  had  clearly  shown  that  the  problem  of  weather 
forecasting  could  be  thought  of  as  an  “initial  condition  problem”  (Bjerknes,  1911).  By 
this  term  we  indicate  the  mathematical  problem  of  predicting  the  future  state  of  a  physical 
system  once  the  initial  conditions  of  the  system  are  known  together  with  the  equations 
governing  its  evolution  in  time. 

In  order  to  solve  it  successfully  Bjerknes  had  also  proposed  an  operative 
procedure  based  on  three  main  components: 

1 .  The  Observing  component; 

2.  The  Diagnostic  component; 

3.  The  Prognostic  component. 

A  worldwide  network  of  in-situ  and  satellite-based  observing  systems  today 
composes  the  Observing  component.  The  average  daily  numbers  of  the  more  common 
observations  disseminated  over  the  Global  Telecommunications  System  to  the  main 
weather  centers  are  summarized  in  Table  1.1  (ECMWF  Global  Data  Monitoring  Report, 
November  2001) 

This  enormous  quantity  of  data  is  composed  of  observations  irregularly 
distributed  in  space  and  taken  at  different  and  often  “asynoptic”  times.  The  Diagnostic 
component  of  the  forecasting  system  is  responsible  for  producing  an  estimate  of  the 
“true”  state  of  the  atmosphere  over  a  regular  spatial  grid  at  a  given  time. 

Starting  from  this  well  defined  initial  state,  the  “primitive”  equations  describing  the 
behavior  of  the  atmospheric  system  are  marched  forward  in  time  in  order  to  produce  an 
estimate  of  the  state  of  the  atmosphere  at  some  future  time  {Prognostic  component). 

The  main  subject  of  the  present  thesis  is  the  new  diagnostic  component  of  the 
CNMCA  forecasting  system:  its  design,  implementation  issues  and  preliminary  results. 
The  motivation  for  this  work  lies  mainly  in  the  deeply  felt  need  to  be  able  to  take 

advantage  of  the  increasing  number  of  satellite  derived  observations  which  are  nowadays 
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available  and  that  cannot  be  easily  accommodated  in  less  recent  objective  analysis 
algorithms. 

Also  a  brief  description  of  the  limited  area  model  (HRM)  used  in  the  Prognostic 
component  of  the  system  will  be  given,  together  with  a  discussion  of  the  main  types  of 
observations  currently  used  and  the  relevant  observation  error  statistics. 


Observation  type 

Number  of  Obs. 

Description 

SYNOP/SHIP 

53268 

Surface  weather  observations 
over  land  or  on  the  sea 

BUOY/DRIFTER 

6300 

Observations  from  moored/ 
drifting  buoys 

TEMP 

1162 

Upper  air  radiosonde 

observations 

TEMP/PILOT 

300 

Upper  air  wind  observations 

AIRCRAFT 

37359 

Automatic  and  manual 

observations  of  Temperature  and 
wind  from  aircraft 

SATOB 

230970 

Cloud  motion  winds  from 
geostationary  satellites  imagery 

NOAA  15/16  ATOVS 

612488 

Polar  satellite  derived 

temperature  and  humidity  profiles 

Table  1.1  Meteorological  observations  disseminated  daily  over  the  Global 
Telecommunications  System  (GTS) 


A.  ATMOSPHERIC  DATA  ASSIMILATION 

According  to  a  widely  accepted  definition  (Daley,  1991)  a  modern  data 
assimilation  system  is  composed  of  four  main  components: 

1 .  Data  quality  control; 

2.  Objective  analysis; 

3.  Initialization  of  the  analyzed  fields; 

4.  Short  range  run  of  the  prognostic  model  in  order  to  produce  an  initial  estimate  of 
the  atmospheric  state  for  the  successive  analysis  step  {First  guess  or  background 
fields). 
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A  schematic  representation  of  the  6-h  intermittent  data  assimilation  system  of 
CNMCA  is  given  in  Fig.  1.1.  A  brief  description  of  components  1.,  2.  and  4.  follows, 
while  discussion  of  the  proposed  new  realization  of  the  objective  analysis  component  is 
deferred  to  Chapters  2-3. 

1.  Data  Quality  Control 

The  data  quality  control  step  is  of  paramount  importance  in  order  to  prevent 
erroneous  data  from  being  fed  to  the  objective  analysis  step  with  deleterious  results  to  the 
performance  of  the  system.  Many  notable  cases  of  recent  failures  of  numerical  weather 
prediction  (NWP)  systems  to  correctly  forecast  high  impact  weather  conditions  (1999 
Christmas  storm  in  Europe,  January  2000  snow  storm  over  the  eastern  coast  of  the  US) 
can  be  attributed  to  the  inaccuracies  in  the  initial  conditions.  Often  these  problems  can  be 
traced  back  to  the  rejection  of  good  observations,  or  the  inclusion  of  faulty  ones,  by  the 
data  checking  algorithms. 

At  CNMCA  the  data  quality  control  of  the  observations  is  performed  in  two 
distinct  steps.  The  first  one,  called  "Observation  Pre-Processing"  has  the  purpose  of 
assigning  a  degree  of  confidence  to  each  reported  datum.  This  is  done  through  a  series  of 
checks  that  include: 


1 .  Check  against  climatological  gross  limits; 

2.  Internal  consistency  checks  (for  example  between  reported  and  recomputed 
heights  in  TEMP  messages); 

3.  Temporal  and  spatial  consistency  checks  for  observations  from  moving 
platforms  (for  example  SHIP  and  DRIFTER  messages). 


Observations  whose  confidence  level  is  below  the  70%  mark  are  discarded.  More 
details  can  be  found  in  Norris  (1990). 
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To  reduce  redundancy  of  information  and  prevent  possible  numerical  problems  in 
the  subsequent  solution  of  the  analysis  equations,  observations  whose  relative  distance  is 
less  than  the  average  grid  distance  of  the  numerical  model  (~55Km)  are  averaged 
together  and  combined  in  a  "Super-Observation"  (Lorenc,  1981). 

In  the  second  step  of  the  data  quality  control,  the  final  decision  on  whether  to 
accept  for  ingestion  the  observations  that  survived  the  pre-processing  step  is  taken.  At  the 
moment  the  decision  is  based  solely  on  the  normalized  distance  of  the  observations  with 
respect  to  the  background  field.  Although  this  method  is  statistically  accurate,  it  can  lead 
to  rejection  of  good  data  in  cases  of  rapidly  evolving  and  poorly  forecasted  weather 
situations.  Work  is  in  progress  on  a  buddy-check  type  of  algorithm  for  estimating  the 
accuracy  of  marginal  observations. 


2.  Initialization 

After  the  objective  analysis  step  has  blended  in  a  statistically  “optimum”  way  the 
information  from  the  first  guess  fields  and  the  new  observations,  a  set  of  analyzed 
meteorological  fields  is  produced  which  is  suitable  for  a  “synoptic”  use  (i.e.  diagnosis  of 
current  state  of  the  atmosphere  for  nowcasting  and  short  range  forecasting  purposes),  but 
not  as  initial  conditions  for  the  integration  of  a  primitive  equation  model.  The  main 
reason  for  this  lies  in  the  fact  that  the  imposed  balance  between  the  wind  and  mass 
observation  increments  are  linear  simplified  conditions  (approx,  geostrophic,  non- 
divergent),  while  the  first  guess  fields  implicitly  satisfy  the  multivariate  nonlinear 
conditions  of  the  numerical  model.  As  a  result,  the  integration  of  non-initialized  fields 
would  cause  the  model  to  go  through  a  geostrophic  adjustment  process  with  the 
excitation  of  inertia-gravity  waves  and  the  consequent  degradation  and  noisiness  of  the 
forecast  fields  in  the  first  6-12  h. 

To  avoid  these  undesirable  effects,  the  “Adiabatic  Implicit  Normal  Mode 

Initialization”  technique  is  used.  A  detailed  explanation  can  be  found  in  Temperton 

(1988),  but  the  main  ideas  can  be  summarized  as  follows.  The  analyzed  fields  are 

projected  over  the  normal  modes  of  a  linearized  version  of  the  model  equations.  These 
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normal  modes  can  be  classified  (at  least  for  the  extratropics)  based  on  their  respective 
frequencies  or  propagation  velocities:  “fast”  modes,  corresponding  to  inertia-gravity 
waves,  “slow”  modes,  corresponding  to  meteorological  Rossby  waves.  The  result  of  the 
projection  operation  are  two  sets  of  ordinary  differential  equations  whose  integration  in 
time  gives  the  time  evolution  of  the  amplitudes  of  the  normal  modes.  The  imposition  of 
appropriate  conditions  (so  called  Machenauer  conditions)  on  the  time  tendencies  of  the 
amplitudes  of  the  normal  modes  leads  to  an  effective  filtering  of  the  high-frequency 
modes  and  removal  of  spurious  numerical  noise  (see  pp. 377-385  in  Haltiner  &  Williams, 
1980). 


3.  Prognostic  Model 

The  numerical  model  used  to  produce  the  first  guess  fields  in  the  intermittent  data 
assimilation  scheme  is  the  High-Resolution  Regional  Model  (HRM)  of  CNMCA.  The 
HRM  is  a  modified  version  of  the  Deutscher  Wetterdienst  EM/DM  model  (Majewski, 
2001),  adapted  to  run  on  Compaq  Alpha  servers.  The  main  numerical  features  of  this 
hydrostatic  primitive  equation  model  are  summarized  below: 


1 .  Lat/Lon  rotated  coordinates  grid,  0.5°  resolution; 

2.  C-type  Arakawa  grid,  2"‘*  order  centered  finite  difference  scheme; 

3.  Hybrid  vertical  coordinate,  31  model  levels; 

4.  Split  semi-implicit  time  integration  scheme; 

5.  Integration  of  Helmholtz  equation  through  Fast  Fourier  Transform  and  Gauss 
method; 

6.  Davies  formulation  of  boundary  conditions; 

7.  4*  order  diffusive  damping  term. 
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As  for  the  physics  package  of  the  model,  its  main  characteristics  are: 


1 .  Rytter  and  Geleyn  (1992)  radiative  scheme; 

2.  Stratiform  precipitation  scheme  with  clouds  microphysics  parameterization; 

3.  Mass  flux  convective  scheme  (Tiedtke,  1989); 

4.  Two-level  vertical  diffusion  scheme  (Mellor  and  Yamada,  1974),  similarity 
theory  at  the  surface  (Louis,  1979); 

5.  Two-level  soil  scheme; 

The  operational  area  of  model  integration  is  shown  in  Fig.  1.2.  Embedded  in  this 
model  is  another  version  of  HRM  (called  Med-HRM)  run  at  double  horizontal  resolution 
(0.25°  effective  grid  spacing)  over  the  Mediterranean  basin  (Fig.  1.3).  Three  hourly 
ECMWF  forecast  fields  give  the  boundary  conditions  for  the  HRM  model. 

The  interface  between  the  objective  analysis  step  and  the  prognostic  model  is 
realized  through  three  additional  software  packages: 

1.  Insertion:  spline  interpolation  of  analyzed  fields  on  pressure  levels  to  hybrid 
coordinate  model  levels; 

2.  IFS2HRM:  interpolation  and  adaptation  of  ECMWF  boundary  fields  to  HRM  grid 
and  prognostic  variables; 

3.  Daily  blending  of  CNMCA  12Z  analysis  fields  with  ECMWF  12Z  analysis  fields. 


For  testing  purposes,  the  assimilation  cycle  is  run  on  a  Compaq  DS20E  server, 
with  a  3-h  data  window  around  the  analysis  nominal  time.  The  run  time  for  an  average 
number  of  independent  observations  (-3500)  is  around  45  minutes,  which,  considering 
the  time  necessary  for  the  post-processing  elaborations,  leads  to  the  availability  of  the 
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analyzed  fields  three  hours  after  the  analysis  nominal  time.  Twice  daily  (at  OOZ  and  12 
Z),  an  extended  run  (+48h)  of  the  HRM  model  based  on  the  assimilation  cycle  analysis  is 
performed. 
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Figure  1 . 1  Data  Assimilation  Cycle  at  CNMCA. 


8 


EURO-HHM  DOMAIN  -  UGM/CNMCA 

tH'W  lie'-W  130- W  150  E  laOi'E  lie-E 


M-E 


™-E 


«1-E 


Eff '  W  \ 


/%r  fi  ^  ^ 

<»'«/  ■-.  /  j  ■  V 


zo'wi  iq'w 


5" 


■□■( 


Figure  1 .2  Regional  model  (HRM)  domain  of  integration. 
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Regional  model  (MED-HRM)  domain  of  integration. 
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II.  MULTIVARIATE  VARIATIONAL  DATA  ASSIMILATION 


Spatial  objective  analysis  of  meteorological  fields  can  be  traced  back  to  the 
pioneering  work  of  a  group  of  research  workers  (J.  Von  Neumann,  J.  Charney,  J. 
Smagorinsky  and  others)  at  Princeton  Institute  of  Advanced  Studies  in  the  late  1940s  (see 
Daley,  1991,  for  more  details  and  a  historical  perspective).  The  early  techniques  were 
based  on  the  concept  of  function  fitting:  the  meteorological  field  to  be  analyzed  was 
expanded  in  a  finite  series  of  known  functions.  The  coefficients  of  the  expansion  were  to 
be  determined  by  a  least  square  minimization  of  some  function  of  the  distance  between 
the  fitting  function  and  the  observation.  The  procedure  leads  to  the  solution  of  a  linear 
system  or,  equivalently,  the  inversion  of  a  square  matrix  (Gram  matrix),  which,  in  the 
case  of  global  fitting,  can  become  extremely  large  and  expensive  to  compute.  Due  to  this 
and  other  technical  problems  (possible  ill-conditioning  of  the  Gram  matrix,  underfitting 
or  overfitting  due  to  the  non-stationariety  of  the  observing  system)  the  function  fitting 
technique  was  quickly  superseded  in  operational  practice  by  the  method  of  successive 
corrections  (SCM)  (Bergthorsson  and  Doos,  1955).  In  this  technique,  the  function  fitting 
is  local  instead  of  global  (i.e.,  only  the  observations  within  a  predetermined  radius  of  the 
analyzed  grid  point  influence  the  analysis)  and  the  weights  are  specified  a-priori  as 
monotonically  decreasing  functions  of  the  distance  between  observation  and  analysis  grid 
point.  Besides  introducing  a  very  efficient  and  robust  algorithm,  Bergthorsson  and  Doos 
also  brought  in  a  number  of  other  ideas  which  are  still  in  wide  use  in  operational 
objective  analysis  schemes:  the  use  of  a  forecast  first  guess  (or  background)  field  and  the 
computation  of  the  a-priori  weights  through  a  statistical  analysis  of  the  objective  analysis 
errors.  For  all  these  attractive  properties,  the  SCM  has  some  important  limitations,  at  least 
in  its  simpler  formulations.  It  gives  too  much  weight  to  the  observations  and  it  ignores  the 
cross-correlations  between  observations. 

Although  most  of  the  problems  can  be  overcome  with  more  sophisticated,  iterated 
version  of  the  algorithm  (Bratseth,  1986),  in  operational  practice  the  SCM  algorithm  was 
superseded  in  the  1980s  by  a  direct  method  of  solution  of  the  analysis  equations  known 

as  “Statistical  Interpolation  ”  or  “Optimal  Interpolation  ”  (01).  The  roots  of  the  method 
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go  back  to  the  work  of  Kolmogorov  (1941)  and  Wiener  (1949)  and  it  is  interesting  to  note 
that  01,  as  other  recent  teehniques,  found  its  early  applieations  not  in  meteorology  but  in 
other  fields  of  the  Earth’s  sciences  and  also  in  engineering.  The  seminal  work  that 
brought  the  teehnique  to  the  attention  of  the  meteorologieal  community  was  that  of 
Gandin  (1963),  who  gave  a  theoretical  derivation  of  the  algorithm  and  discussed  the  main 
issues  of  its  application  in  an  operational  environment. 

The  01  method  derives  its  name  from  the  fact  that  it  tries  to  minimize  (optimality 
principle)  the  expeeted  varianee  of  the  analyzed  fields  through  the  a-priori  knowledge  of 
the  error  characteristics  of  both  the  observations  and  the  background  fields.  In  operational 
practice  a  number  of  assumptions  are  made  whieh  make  the  method  sub-optimal.  Apart 
from  fundamental  issues  sueh  as  the  imperfect  knowledge  of  the  observations  and 
background  fields  error  statistics,  which  are  also  often  assumed  stationary,  homogeneous, 
isotropie  and  separable  into  the  horizontal  (or  constant  pressure)  and  the  vertical 
components,  the  main  approximations  eommonly  used  are: 

1 .  Loeal  approximation:  only  a  limited  number  of  observations  enter  into  the  solution 

of  the  analysis  equations  for  any  given  grid  point  (or  analysis  volume,  Lorene, 

(1981)); 

2.  Only  observations  that  are  linearly  related  to  the  prognostic  variables  of  the  model 

are  commonly  used,  although  extensions  of  the  method  to  handle  non-linear 

observations  are  possible  (Ledvina  and  Pfaendtner,  1995). 

In  the  1990s,  due  to  progress  in  computer  proeessing  speed  and  data  storage 
eapabilities,  the  variational  approach  to  the  objective  analysis  problem  has  gained 
popularity  as  an  elegant  and  powerful  mathematical  formalism  capable  of  overcoming 
some  of  the  problems  mentioned  above  (Parrish  and  Berber,  1992;  Rabier  and  Courtier, 
1992;  Cohn  et  al.,  1998;  Daley  and  Barker,  2000).  The  methods  developed  by  these 
authors  are  known  under  the  general  name  of  3D-Var  algorithms  and,  apart  from 
implementation  details,  are  basically  equivalent.  A  scalar  measure  (usually  ealled  cost 
function)  of  the  distanee  between  the  observations  and  baekground  fields  to  the  analysis 
fields  is  minimized  with  respeet  to  the  unknown  analysis  values,  thus  producing  the 


12 


maximum  likelihood  estimate  of  the  atmospheric  state.  In  the  case  of  quadratic  cost 
functions,  the  minimization  procedure  leads  to  a  set  of  linear  analysis  equations  that  are  a 
generalization  of  the  01  analysis  equations.  This  is  not  a  mere  coincidence  since,  for 
stochastic  variables  with  a  Gaussian  error  distribution  (as  many  weather  fields  can  be 
thought  of  having),  it  can  be  shown  (Daley,  1991)  that  the  minimum  variance  estimate 
and  the  maximum  likelihood  estimate  are  coincident. 

In  the  3D-Var  methods  a  basic  assumption  of  01  is  retained,  namely  that  of  the 
stationary  character  of  the  background  error  statistics.  Relaxation  of  this  hypothesis  leads 
to  various  modem  analysis  techniques,  commonly  known  as  Four  Dimensional  Data 
Assimilation  Methods  (FDDA),  which  can  be  roughly  divided  in  two  main  categories 
(Menard,  1994):  time  extensions  of  variational  methods,  known  as  4D-Var,  where  the  fit 
to  the  data  is  performed  over  an  extended  period  of  time  (instead  of  intermittently)  using 
the  model  as  a  strong  constraint  (i.e.,  strict  consistency  with  the  model,  assumed  perfect, 
is  required);  methods  derived  from  estimation  theory  (Kalman-Bucy  filter  and  its  various 
modifications  and  extensions).  In  these  latter  methods  no  perfect  model  assumption  is 
required;  however,  common  to  both  kind  of  algorithms  are  the  great  computational  and 
storage  requirements  that,  at  present,  make  them  unfeasible  options  except  for  the  largest 
weather  centers. 

In  the  present  work  we  concentrate  on  the  version  of  the  3D-Var  algorithm  which 
forms  the  basis  of  the  new  objective  analysis  system  at  CNMCA.  To  the  author’s 
knowledge,  it  was  first  implemented  operationally  at  the  Data  Assimilation  Office, 
NASA  Goddard  (Cohn  et  al,  1998),  and  more  recently  at  Naval  Research  Laboratory, 
Monterey  (Daley  and  Barker,  2000).  The  basic  theory  will  be  reviewed,  then  a  thorough 
description  of  the  background  error  covariance  models  used  in  the  CNMCA 
implementation  of  the  algorithm  will  be  given.  Finally  the  topic  of  a  possible  anisotropic, 
flow-dependent  extension  to  the  standard  covariance  modeling  will  be  addressed. 


A.  THEORY  OF  3D-VAR 

The  following  is  only  a  short  account  of  the  main  ideas  of  3D-Var  which  are 

relevant  to  the  present  work:  the  interested  reader  can  find  more  details  and  a  wider 
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theoretical  background  in  the  works  of  Lorenc,  1986;  Daley,  1997;  Cohn,  1997;  Cohn  et 
ah,  1998;  Daley  and  Barker,  2000. 

Let  j  be  the  column  vector  containing  all  the  p  (~10"^)  observations  at  the  analysis 
time;  jc,  the  column  vector  of  the  true  values  of  the  n  (~10^)  state  (prognostic)  variables  at 
the  same  time  at  the  model  grid  points;  Xa  and  Xb  the  analogous  quantities  for  the  analyzed 
and  background  fields;  then  the  forecast  error  vector  is: 


(2.1) 


while  the  observation  error  vector: 


eo=y-H(X()  (2.2) 

where  H  is  the  observation  (or  forward)  operator,  i.e.  the  operator  that  performs  the 
transformation  from  the  state  variables  on  grid  points  to  the  observed  variables  at  the 
observing  locations  (in  the  case  of  linearly  related  variables  it  reduces  to  an  interpolation 
operator).  We  make  the  usual  assumptions  that  these  error  vectors,  e*  and  Co  have  normal 
distribution  functions  with  zero  mean  (i.e.:  no  bias)  and  are  mutually  uncorrelated. 

From  these  we  can  now  define  the  forecast  error  covariance  matrix: 

Pb=  <  ebej>  (2.3) 

and  the  observation  error  covariance  matrix: 

R  =  <eoeo^>  (2.4) 

By  definition,  both  matrices  are  symmetric,  positive  definite,  with  dimensions  nxn 
and  pxp  respectively. 
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Given  the  statististieal  assumptions  made  on  Cb  and  Cg,  it  ean  be  shown  (Daley, 
1991,  Chap.2)  that  the  maximum  likelihood  estimate  of  the  state  of  the  atmospherie 
system  is  the  one  the  minimizes  the  following  cost  function; 

J  =  0.5 [y  -  H(xa)fR'‘[y  -  H(xa)]  +  0.5 [xb  -  XafPb‘[xb  -  xj  (2.5) 

2 

i.e.,  minimize  a  scalar  distance  in  L  of  the  analysis  fields  from  both  the  observations  and 
the  first  guess  fields  based  on  their  respective  perceived  accuracies. 

To  find  the  minimum  of  the  cost  function  (2.5),  we  differentiate  it  with  respect  to 
Xa,  obtaining  the  gradient  of  J: 

VJ  =  -  H^R  ^y  -  H(xa)]  +  Pb'xb  (2.6) 

where  we  made  use  of  the  property  of  bilinear  symmetric  forms  dldx(x^Ax)  =  (A  +  A’^)x 
=  2 Ax,  and  introduced  the  Jacobian  matrix  of  the  observation  operator  H: 


H  =  d/dx(H)  (2.7) 

A  necessary  condition  for  a  minimum  is  obtained  by  setting  VJ=  0: 


-  H^R  ‘[y  -  H(xa)]  +  Pb^Xb  +  H^R^H[Xa  -  xJ  -  H^R  ‘H[Xa  -xJ  =  0  (2.8) 

T  -1 

where  the  vector  H  R~  H[Xa  -  XbJ  has  been  added  and  subtracted.  Assuming  the  analysis 
fields  to  be  only  first  order  corrections  to  the  first  guess  fields  {^"Tangent  Linear” 
approximation)  we  have: 


H(Xa)  =  H(Xb  -  (Xa  -  Xb))  =H(Xb)  -  H(Xa  -  Xb) 
which,  when  substituted  into  (2.8),  yields: 


(2.9) 
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-  H^R^[y  -  H(xh)]  +  (Pb'+  hVh)  (xa  -Xb)  =  0 


(2.10) 


or, 


Xa-Xb  =  (Pb'+  H^R^H)^  H^R  '[y - H(xb)]  (2. 1 1) 

This  form  of  the  analysis  equations  involves  eovarianee  matriees  eomputed  in  the 
grid  spaee  of  the  model  (or,  equivalently,  in  its  speetral  spaee),  whieh,  in  the  present  ease, 
would  involve  matriees  of  order  -10^x10^  (although,  in  speetral  space  and  under  the 
assumptions  of  homogeneity  and  horizontal  isotropy  for  correlations,  they  can  be  shown 
to  be  diagonal  or  block  diagonal:  Courtier,  1997;  Courtier  at  al,  1998). 

A  less  expensive  but  equivalent  form  of  the  analysis  equations  can  be  derived 
making  use  of  the  Sherman-Morrison-Woodbury  formula  (for  the  details  of  the  derivation 
see  Courtier,  1997),  or,  more  simply  by  noting  that: 

H^R  ‘  (Pb^+  H^R  ‘H)  ‘  =  (HPb  H^+  R)  Pb  (2.12) 

and  that  all  the  matrices  considered  and  their  inverse  are  positive  definite.  Inserting  2.12 
into  2.1 1  we  find: 

Xa-Xb=  Pb  (HPb  H^+  R)-‘[y  -  H(xb)J  (2.13) 

This  formulation  is  the  one  used  in  the  present  work.  To  the  author’s  knowledge, 
it  was  first  implemented  in  the  NASA/Goddard  Data  Assimilation  Office  “Physical  Space 
Statistical  Analysis  System”  (Cohn  et  ah,  1998)  and,  more  recently,  in  the  NAVDAS 
assimilation  system  of  NRL  Monterey  (Daley  and  Barker,  2000). 

From  inspection  it  is  clear  that  we  are  dealing  with  an  observation  space 
algorithm:  R  is  defined  in  observation  space  while  Pb  is  projected  into  it  by  means  of  the 
observation  operator  H.  Taking  into  account  the  average  number  of  observations 
currently  entering  into  the  CNMCA  assimilation  system  (~10"^),  we  can  see  that  this 
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approach  is  able  to  considerably  reduce  the  computational  and  storage  load  on  the 
computing  facilities  without  placing  any  constraint  on  the  eovariance  function  models. 

The  main  differences  with  respeet  to  standard  01  algorithms  are  twofold;  first 
there  is  the  explicit  use  of  the  observation  operator  H,  whieh  will  provide  a  natural  way  to 
extend  the  objective  analysis  to  include  observations  related  in  a  nonlinear  way  to  the 
prognostic  variables  of  the  model  (see  below);  secondly,  as  it  will  be  shown  below,  the 
solution  to  (2.12)  is  computed  globally  (i.e.:  making  use  of  all  available  observations  for 
every  grid  point),  without  the  need  for  data  selection  procedures. 


The  extension  of  (2.12)  to  non-linear  observations  (sueh  as  radiances,  wind 
speeds,  total  column  water  content,  etc.,  where  H  depends  on  the  model  state  x)  is 
conceptually  simple:  we  define  a  series  of  system  states: 


Xq  ^2y  •  •  •y^i-h^b^i+ly  •  ♦  • 


(2.14) 


where  the  starting  state  coincides  with  the  first  guess  fields. 

Then,  defining  the  Jacobian  of  the  observation  operator  around  the  i  state  of  the 

system 


Hi  =  d/dx(H)^^,  (2.15) 

the  nonlinear  version  of  the  analysis  equations  is  found  by  looking  for  a  Newtonian 
iterative  solution  to  the  problem  of  finding  the  root  of  the  Jaeobian  of  the  cost  function 
( VJ=0;  eq.  2.8)  :  starting  from  a  first  guess  state  Xo=Xb,  the  i+1  state  is  given  by: 


Xi+i  =  Xi-S^J(Xi)~^  ■  VJ(xi)  (2.16) 

where  \^J(Xi)~^  =  (Pt^  +  H^iR'^  Hi)  is  the  Hessian  matrix  of  the  cost  function  J  and  it  can 
be  shown  to  be  the  inverse  of  the  analysis  error  covariance  matrix. 
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After  some  manipulation,  2.16  ean  be  reeast  in  the  eomputationally  more  effieient 


form: 


Xi+,  =  xo  +  Pb  H^i  (Hi  Ph  H^i  +  R)-'[y  -  H(xi)  +  H(Xi  -  xo)J  (2.17) 

The  iterative  proeedure  stops  when  the  difference  between  Xi+i  and  Xi  becomes 
smaller  than  a  predefined  value.  Unlike  the  linearized  version  of  the  analysis  equations, 
which  result  from  a  quadratic  cost  function,  the  minimization  procedure  (2.15)  cannot  be 
guaranteed  to  converge  and  it  can  have  more  the  one  minimum.  Up  to  now  only 
observations  linearly  related  to  the  prognostic  variables  of  the  model  have  been  used  in 
the  experimental  phase. 


B.  IMPLEMENTATION  ISSUES 

The  practical  implementation  of  objective  analysis  equations  (2.13)  has  been 
carried  out  as  follows.  The  set  of  equations  (2.13)  can  be  solved  in  two  steps.  First 
solution  of  the  linear  pxp  system: 

(HPbH^+R)z=y-H(xh)  (2.18) 

in  the  unknown  vector  z.  Secondly,  projection  of  the  solution  on  grid  space  via: 

Xa-Xb=PbH^z  (2.19) 

The  second  step  amounts  to  perform  a  matrix-vector  product  between  the  pxn 
matrix  Pb  H  and  the  p  vector  z,  which  can  be  computed  efficiently  in  Fortran90  code. 

The  first  step  involves  the  solution  of  a  large,  sparse,  symmetric  and  definite 
positive  linear  system.  It  can  be  shown  (Golub  and  van  Loan,  1996)  that  this  step  is 
mathematically  equivalent  to  finding  the  minimum  of  the  following  cost  function: 

F(z)  =  l/2z^(HPb  H^+  R)z  -z'^(y-  H(xb))  (2.20) 

which  can  be  done  using  a  standard  Conjugate  Gradient  (CG)  Algorithm  (Subroutine 
FI  IGBZ,  NAG  Library,  Mark  17,1995,  has  been  used). 
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In  numerical  experiments,  it  has  been  found  convenient  to  implement  a  scaled 
version  of  2.13.  This  is  due  to  the  reason  that,  in  this  way,  the  condition  number  of  the 
matrix  (HPbH^  +  R)  (defined  as  the  absolute  value  of  the  ratio  of  the  maximum  to  the 
minimum  eigenvalue)  shows  a  considerable  decrease,  thus  speeding  up  the  convergence 
of  the  descent  algorithm. 

The  scaled  version  of  (2.13)  is  found  following  Daley  and  Barker  (2000).  If  we 
redefine  the  observation  operator  H  as  the  product  of  a  spatial  interpolation  operator  /T* 
and  a  real  observation  operator  H  (i.e.  H*  H)  and  set: 


Pb  hJ=  and,  Ph  hJ= 

(2.21) 

p^gr/objjT^  fj^gr/ob 

(2.22) 

jjp  ob/ob  jjT  ob jl/2  ^  ob/ob  ob jl/2 jjT 

(2.23) 

Sh=  diag(H*  Pb’"^"''  H/) 

(2.24) 

Where  Sb=diag{Pb),  (i.e.  the  background  variances  of  the  analysis  variables), 
Sb'’=diag{Pb^),  (i.e.  the  observation  variances  of  the  analysis  variables),  and 
are  the  corresponding  correlation  matrices.,  then  the  scaled  form  of  2.13  is: 

Xa-Xb=  Sb'^'  [Sb^Y'H^  Sb'^'j  Sb^'^fy  -  H(xb)] 

(2.25) 

and  =  Sh^^^H[Sb''f^  Cb’'^°’'[SbY^H^Sh^^\ 

The  main  point  here  is  the  rescaling  of  the  (H  Ph  +  R)  covariance  matrix  in  the 

non-dimensional  form  [Ch^^°^+  SY^R  SY^]. 

In  experimental  runs,  without  preconditioning,  for  an  average  number  of 
observations  (p~3500),  the  cost  function  (2.20)  converges  to  within  machine  precision  in 
around  150  steps  (average  time  ~90  seconds  on  Compaq  DS20E  server).  Since  the 
computational  cost  of  the  algorithm  scales  as  number  of  CG  iterations  multiplied  by  ,  it 
is  important,  in  view  of  ingesting  asynoptic  type  of  observations,  to  implement  an 
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effective  pre-conditioner  for  the  (H  Ph  +  ify)  matrix.  Work  in  this  direction  is  still  in 
progress. 


C.  BACKGROUND  ERROR  COVARIANCES 

"'The  most  important  element  in  the  statistical  interpolation  algorithm  is  the 
background  error  covariance  matrix.  To  a  large  extent,  the  form  of  this  matrix  governs 
the  resulting  objective  analysis...  ”  (Daley,  1991).  Much  effort  has  so  far  gone  into  the 
redefinition  of  the  background  error  covariance  model.  The  derivation  follows  the  main 
ideas  of  Bergman  (1979),  Thiebaux  et  al.  (1986),  Thiebaux  at  al.  (1990),  Tillmann 
(1999).  An  original  contribution  has  been  the  explicit  formulation  of  the  temperature- 
wind  cross  correlations  in  spherical  coordinates,  using  the  thermal  wind  relationship  as  a 
constraint.  This,  to  the  author’s  knowledge,  is  not  to  be  found  in  the  relevant  literature. 

The  need  to  express  the  auto  and  cross-correlations  in  spherical  coordinates 
instead  of  using  a  local  plane  projection  arises  from  the  fact  that,  for  the  correlation 
model  used,  one  has  non  negligible  correlation  values  at  distances  comparable  to  the 
Earth’s  radius. 

Taking  into  account  the  narrower  structure  shown  by  the  background  vertical 
correlations  of  temperature  with  respect  to  those  of  geopotential  and  the  weaker  vertical 
correlation  of  radiosondes’  temperature  measurements  with  respect  to  geopotential 
observations,  the  choice  was  made  to  adopt  the  following  analysis  variables:  temperature 
(T),  zonal  (u)  and  meridional  (v)  components  of  wind,  surface  pressure  (SP)  and  relative 
humidity  (RH). 


1.  Upper  Air  Analysis  Covariance  Model 

The  upper  air  analysis  is  multivariate  in  (T,  u,  v).  The  covariance  model  has  been 
derived  as  follows. 


Starting  from  the  geostrophic  constraint: 
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u  =  -(k/sin(p)  d^ldcp 
V  =  k/(sin(pcos(p) 


(2.26) 

(2.27) 


where  ((p,  X)  are  the  latitude-longitude  eoordinates,  0  is  the  geopotential,  and  k=n/2Q  is  a 
eonstant  that  takes  into  account  the  geostrophic  coupling  parameter  ji  and  the  orbital 
angular  velocity  of  the  Earth  Q.  Making  use  of  the  equation  of  state  for  dry  air  (p=pRdT) 
and  the  hydrostatic  equation  {dpld0=-p),  (2.25-26)  can  be  recast  as  a  form  of  the  thermal 
wind  constraint: 

du/dp  =  kRc/(p  sincp)  dTIdcp  (2.28) 

dv/dp  =  -k/(psin(pcos(p)  dTIdX  (2.29) 

Under  the  usual  hypothesis  of  homogeneity,  isotropy  and  separability  for  the 
temperature  autocorrelation  function,  we  assume  the  following  functional  form  for  the 
temperature  covariance: 


Cov(n  Tj)  =  ar'  R(x)  x^^iPuPj)  (2.30) 

where  gt  is  the  background  error  temperature  variance  (which  can  vary  both  in  latitude 
and  in  pressure  level),  R(x)  is  the  quasi-horizontal  (isobaric)  component  of  the  correlation 
model  (function  only  of  the  Great  Circle  distance  x  =  cos'^ (sincp isincpi  +  coscpicoscpj 
cos  /zl/l/)of  points  i,j)  and/^^  is  the  vertical  part  of  the  correlation  function.  Following 
Thiebaux  et  al.  (1986)  and  Daley  and  Barker  (2000),  the  functional  representation  for 
R(x)  has  been  chosen  as  a  Second  Order  Autoregressive  (SOAR)  Function  of  the  form: 

R(x)  =  (1  +c  x)exp(-c  x)  (2.31) 

Where  the  length  scale  c‘  will  be  specified  through  a  statististical  analysis  of  the 
observed  minus  forecast  increments,  as  will  be  shown  in  Chapter  3. 
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The  derivation  that  follows  is,  however,  independent  of  the  speeifie  form  of  either  R(t)  or 
Making  use  of  (2.27-28)  is  easily  seen  that: 


d/dlnpi  Cov(Ui,Tj)  =  gt  kRd/sin(pidR(x)/dx  dr/d(piX^^(Pi,Pj)  (2.32) 

d/dlnpj  Cov(Ti,Uj)  =  gt  kRd/sincpj  dR(x)/dx  dx/d(pj  x^^(Pi,Pj)  (2.33) 

d/dlnpi  Cov(Vi,Tj)  =  -  gt  kRd/(sin(piCos(pi)  dR(x)/dx  dx/dXi x^^ (puPj)  (2.34) 

d/dlnpj  Cov(Ti,Vj)  =  -  gj  kRci/(sin(pjCos(pj)  dR(x)/dx  dx/dXj  x^^ (Pi,Pj)  (2.35) 

^/(dlnpidlnpj)  Cov(Ui,Uj)  =  (GjkRdf /(sirKpiSincpi)  (d^R(x)/dx^ dx/dcpidx/dcpj  + 

dR(x)/dx  ^ x/d(pid(pj)  x^^iPuPj)  (2.36) 

^/(dlnpidlnpj)  Cov(Vi,Vj)  =  (gt  kRdf/(sin(piCos(piSin(piCos(pj)  (d^R(x)/dx^ dx/dkidx/dkj 
+  dR(x)/dx  ^x/dkidkj)  x^^(Pi,Pj)  (2.37) 

^/(dlnpidlnpj)  Cov(Ui,Vj)  =  -  (gt  kRdf/(sin(piSin(piCos(pj)  (d^R(x)/dx^ dx/dtpidx/dkj  + 
dR(x)/dx  ^ x/dcpidkj)  x^^(Pi>Pj)  (2.38) 

^/(dlnpidlnpj)  Cov(Vi,Uj)  =  -  (gt  kRdf/(sin(piSin(piCos(pj)  (d^ R(x)/dx^ dx/d(pjdx/dki  + 


dR(x)/dx  ^ x/dcpjdki)  x^^(Pi’Pj)  (2.39) 

Similarly  to  (2.29)  the  other  covarianees  ean  be  written  as: 

Cov(Ui,Tj)  =  Gu  GTR^^(ri,rj)x"^(Pi,Pj)  (2.40) 

Cov(Vi,  Tj)  =  Gy  GTR'’^(ri,rj)  x'’^(Pi,Pj)  (2.41) 

Cov(Ui,Uj)  =  Gu  R""(ri,rj)  x““(Pi,Pj)  (2.42) 

Cov(Vi,Vj)  =  Gy  Rk^(ri,rj)  x'’'’(PhPj)  (2.43) 

Cov(Ui,Vj)  =  Gu  Gy  R“''(nrj)  x“'’(Pi>Pj)  (2.44) 

and  substituted  into  (2.25-32),  thus  obtaining  for  the  vertical  components: 

x““(PiPj)  =  X^(PiPj)  =  x“'’(Pi’Pj)  =  x'’“(PiPj)  (2.45) 
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x“^(Pi>Pj)  =X^(PiPj)  =  d/dlnpjx‘“(Pi,Pj)  (2.46) 

X^“(Pi’Pj)  =X^'’(PiPj)  =  d/dlnpix"‘'(Pi,Pj)  (2.47) 

X^^(Pi’Pj)  =  ^/(^Inpidlnpj)  x““(Pi,Pj)  (2.48) 

while  for  the  isobaric  components  we  find: 

cr„  R^^(ri,rj)  =  ot  kRd/sin(pi  dR(x)/dx  dx/d(pi  (2.49) 

Ov  R^^(ri,rj)  =  -  Ot  kRd/(sin(pi  cos(pi)  dR(x)/dx  dx/dXi  (2.50) 

Ou  R"“(ri,rj)  =  (aTkRdf/(sin(piSin(pi)  (d^R(x)/dx^ dx/d(pidx/d(pj  + 

dR(x)/dx^x/d(pid(pj)  (2.51) 

au  cTv  R“'’(ri,rj)  =  -  (aTkRdf/(sin(pi  sincpi  coscpj)  (d^R(x)/dx^ dx/dcpidx/dkj  + 

dR(x)/dx^  x/dcpidkj)  (2.52) 

Ov  R^^ifurj)  =  (aTkRdf/(sin(piCos(piSin(piCos(pj)  (d^R(x)/dx^dx/dlidx/dlj  + 

dR(x)/dx^  x/dXidXj)  (2.53) 

Setting 

Lim(t^o)  dR(x)/dx  =L  (2.54) 

we  obtain  for  the  geostrophically  constrained  wind  variances: 

2  2  2 
Lim(t^o)  Cov(Ui,Uj)  =  Ou  =  Lim(t^o)  Cov(Vi,Vj)  =  =  -(ajkRd/sifKpi)  L  (2.55) 

Substituting  (2.54)  into  (2.48-52)  we  completely  determine  the  isobaric  correlations: 

R“^(ri,rj)  =  dR(x)/dx  dx/dtpi  (2.56) 

R''^(ri,rj)  =  -  (cos(pi)~^  dR(x)/dx  dx/dXi  (2.57) 

R"'^(ri,rj)  =  (d^R(x)/dx^ dx/d(pidx/d(pj  +  dR(x)/dx^x/d(pid(pj)  (-L)~‘  (2.58) 
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R“^(ri,rj)  =  -  (d^R(x)/dx^ dz/dfidr/dXj  +  dR(x)/dx^ x/d(pidXj)  (-L  cos<pj)~^  (2.59) 
R^''(ri,rj)  =  (d^ R(x)/dx^ dx/dXidx/dXj  +  dR(x)/dx^x/dXidXj)  (-  L  coscpi  cos(pj)~^  (2.60) 


The  above  formulas  are  independent  of  the  chosen  correlation  model.  In  the 
present  case,  assuming  (2.29)  as  the  functional  representation  for  the  isobaric  temperature 
autocorrelation  model,  we  obtain: 


R“^(ri,rj)  =  c(x/sinx)exp(-cx)(  cosfi  sinfj-  cosfj  sinfi  cos  I  AX  I)  (2.61) 

R^yr.rj)  =  -  R^^(ri,rj)  (2.62) 

RX^(ri,rj)  =  c(x/sinx)exp(-cx)(cos(pj  sin(Xi-Xj))  (2.63) 

R^^(r,rj)=-R^^(r,rj)  (2.64) 

R““(ri,rj)  =  -  ((sinx(l-cx)-  xcosx)/(sin  x)  (coscpi  sincpj-  coscpj  sin<pi  cos  I  AX  j) 

(coscpj  sincpi-  cosfi  sincpjcos  IaxI)  -  (x/sinx)  (cosfi  sincpj- 

cosfj  sin(pi  cos  / AX  / ))  exp(-cx)  (2.65) 

RX^(ri,rj)  =  ((sinx(l-cx)-  xcosx)/(sin  x)  (coscpi  coscpjsin  j  AX  j)  + 

(x/sinx)  cos  IaxI))  exp(-cx)  (2.66) 

R“^(ri,rj)  =  ((sinx(l-cx)-  xcosx)/(sin\)  (cosfi  sincpj- cosfj  sin(pi  cos  IaxI) 

(cos(pi  sin(Xi-Xj)  +  (x/sinx)  ( sincpi  sin(Xi-Xj))  exp(-cx)  (2.67) 


The  above  isobaric  correlations  are  shown  in  Fig.2.1  through  2.6. 
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TiT  Co<Tflia1lQri  FumiUo?> 


Figure  2.1  T-T  correlation  function  (SOAR  Model,  c  =  0.1  rad). 


U-T  LornUillBn  FLrEllcn 


Figure  2.2  U-T  correlation  function  (SOAR  Model,  c  =  0.1  rad). 
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V-f  ‘Currvtebavi  PLTclJBn 


Figure  2.3  V-T  correlation  function  (SOAR  Model,  c  =  0.1  rad). 


U-U  FuiiUd^ 
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Figure  2.4  U-U  correlation  function  (SOAR  Model,  c  =  0.1  rad). 
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V-V  ComlaliBn  hunduin 


Figure  2.5  V-V  correlation  function  (SOAR  Model,  c  =  0.1  rad). 


U-V  CDm^illDTT  Funcbon 


Figure  2.6  U-V  correlation  function  (SOAR  Model,  c  =  0.1  rad). 
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The  vertical  part  of  the  correlation  function  has  been  modeled  after  Bergman 


(1979): 


x‘“(Pi’Pj)  =  (1  +kplog^  (p/Pj)/^  (2.68) 

which,  for  the  other  correlations,  gives: 

X^(Pi’Pj)  =x“'’(Pi’Pj)  =x'’"(Pi’Pj)  =  (l+kplog^(pi/pj))~‘  (2.69) 

x“^(PiPj)  =x'’^(Pi’Pj)  =  2kp^Hog(pi,pj)  (l+kplog^(pi/pj)f  (2.70) 

X^yPiPj)  =X^'’(Pi’Pj)  =  -2kp^^log(pi,pj)  (l+kplog^(pi/pj))~^  (2.71) 

X^^(PiPj)  =(1-  4kplog  (pi,pj)  x'“(Pi,Pj))  (l+kplog^(pi/pj)X^  (2.72) 


Sketches  of  the  functions  are  given  below  for  kp=5,  in  Fig.  2.7  through  2.10. 

The  vertical  correlation  parameter  kp  will  be  determined  through  a  statististical 
analysis  of  the  observed  minus  forecast  increments,  as  will  be  shown  in  Chapter  3. 
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Figure  2.7  U-U  Vertical  Correlation  function  {kp  =  5) 


UvrbcHi  iCanwlaLKHn  u— T 

Figure  2.8  U-T  Vertical  Correlation  function  {kp  =  5) 
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Figure  2.9  T-U  Vertical  Correlation  function  {kp  =  5) 


Figure  2.10  T-T  Vertical  Correlation  function  {kp  =  5) 
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It  is  also  useful  to  derive  the  mutual  eorrelations  of  temperature  and  wind  field 
with  the  geopotential.  In  order  to  do  this,  elimination  of  u,v  from  (2.26-2.29)  leads  to: 

dT/dq)  =  -l/(Rd)  ^^ld(pdln(p)  (2.73) 

dim  =  -l/(Rd)  ^mMn(p)  (2.74) 

From  these  relations,  assuming  that  the  geopotential  autoeovarianee  ean  modeled 
as: 


Cov(^,  4>j)  =  R(x)  (Pi,Pj)  (2.75) 

we  can  derive  the  correlations: 

d/d(p  i  Cov(Ti,  0j)  =  -a^R/^  dR(T)/dT  di/d(pid/dln(pi)  (Pi>Pj)  (2.76) 

d/d?iiCov( (Pi, Tj)  =  -ajRd^  dR(x)/dx  dx/d?ijd/dln(pj) y^  (Pi,Pj)  (2.77) 

^/dXidcp  i  Cov(Ti, Tj)  =  -a^Rd^  dR(x)/dx  ^ x/dXidcpi  ^/dln(pi)dln(pj)y^  (P^Pj) 

(2.78) 

Separating  the  vertical  and  isobaric  components  of  the  correlations,  we  find: 


X^%iPj)  =  d/dlnpiy^  (pi,pj)  =  y^“(Pi,Pj)  (2.79) 

X^^(PiPj)  =  d/dlnpjy^  (pi,pj)  =  x‘‘^(PiPj)  (2.80) 

X^^(Pi,Pj)  =  ^/(dlnpidlnpj)  y**(pi,pj)  =y“"(pi,pj)  (2.81) 

while  the  isobaric  components  are  given  by: 

aT(7^R^^(ri,rj)  =  -aj  Rd^  dR(x)/dx  dx/dXj  (2.82) 

a-ro^R^^  (ri,rj)  =  -a^  Rd^  dR(x)/dx  dx/dxpi  (2.83) 

gt  Rd^^x/dXid(pidR(x)/dx  (2.84) 
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From  these  expressions  we  are  also  able  to  derive  the  geostrophieally  constrained 
autocovariances: 


Lim(r^o)  Cov(Ti,Tj)  =  oj  =  Rd'^Lim(^^o)  Cov(<Pi,4>j)  =  Rd^ aj  (2.85) 

In  analogous  fashion,  starting  from  (2.25-2.26)  the  cross  correlations  between  the 
wind  components  u,v  and  the  geopotential  can  be  derived.  The  results  of  the 
computations  are: 


(Pi, Pi)  =  X^(Pi.Pj)  =  x‘"'(Pi,Pj)  (2.86) 

X%i,Pj)  =  X'^  (Pi,Pj)  =  X^(Pi,Pj)  =  x"‘'(Pi,Pj)  (2.87) 

while  for  the  isobaric  cross  correlations  we  find: 

au<70R“^(ri,rj)  =  -a^  k/sincpi  dR(x)/dx  dx/dxpi  (2.88) 

a40uR^(fi,rj)  = -aj  k/sin(pj  dR(x)/dx  dx/dxpj  (2.89) 

ava^R'’^(ri,rj)  =  aj  k/(sin(pi  coscpi)  dR(x)/dx  dx/dAi  (2.90) 

a40v  R^(ri,rj)  =-a^  k/(sin(pj  cos(pj)  dR(x)/dx  dx/dAj  (2.91) 


2.  Surface  Analysis  Covariance  Model 

The  surface  analysis  covariances  used  in  the  objective  analyses  of  the  surface 
pressure  (SP),  mean  sea  level  pressure  (MSLP)  and  10-meter  wind  fields  are  a  simplified 
version  of  the  models  used  in  the  upper  air  analysis. 

In  this  case  only  the  geostrophic  constraints  (2.26-2.27)  apply.  The  computations 
are  similar  to  the  ones  described  above  and  lead  to  analogous  results,  with  the  notable 
exception  that,  assuming  that  the  pressure  autocovariance  is  modeled  as: 

Cov(pi,pj)  =  ap^  R(x)  (2.92) 

Then,  the  geostrophieally  constrained  wind  variances  are  given  by: 
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2  2  2 
Lim(t^o)  Cov(Ui,Uj)  =  Ou  =  Lim(r^o)  Cov(Vi,Vj)  =  =  -(apk/psin(pi)  L  (2.93) 


where  L  is  given  by  (2.47).  For  the  SOAR  model  of  (2.24)  the  wind  component 
autocovariances  reduce  to: 

(Tu  =  (7^  =  ((7pkc/psin(pif  (2.94) 

It  is  clear  that  this  correlation  model  is  really  applicable  only  under  the  provision 
that  the  geostrophic  components  of  the  observed  surface  winds  have  been  extracted.  This 
requires  the  use  of  an  appropriate  boundary  layer  model.  Work  in  this  direction  is  under 
way,  making  use  of  the  Planetary  Boundary  Layer  model  developed  by  R.A.  Brown  and 
coworkers  (Brown  and  Levy,  1986). 


3.  Anisotropy  and  Flow  Dependency  of  Covariance  Functions 

An  implicit  assumption  of  3D-Var  algorithms  (and,  in  general,  of  intermittent 
assimilation  schemes)  is  the  stationary  character  of  the  background  error  covariances. 
This  simplification,  together  with  those  of  isotropy  and  homogeneity,  is  widely  used  in 
operational  settings  due  to  the  reason  that  taking  into  account  the  temporal  evolution  of 
the  background  covariances  would  greatly  increase  the  complexity  of  the  objective 
analysis  algorithm  and  the  burden  placed  on  the  computing  resources.  However,  it  is  very 
well  known  (Daley,  1991;  Otte  et  ah,  2001),  that  the  observed-minus-background 
correlation  patterns  of  weather  systems  show  considerable  anisotropy  (the  SW-NE  tilt  of 
upper  level  trough  axis,  for  instance),  which  make  the  isotropy  and  stationarity 
approximations  serious  shortcomings  of  data  assimilation  systems  which  do  not  take  the 
time  evolution  of  the  covariance  matrices  into  account. 

The  problem  has  been  tackled  in  a  variety  of  ways.  In  the  context  of  variational 
assimilation  there  are  two  main  approaches: 


33 


1.  the  4D-Var  scheme,  where  the  background  covariance  matrix  is  evolved 
implicitly  by  the  dynamics  of  the  tangent  linear  model  (and  its  adjoint;  see 
Rabier  and  Courtier  (1992)  for  details); 

2.  the  Kalman-Bucy  filter,  wich  explicitly  evolves  the  background  covariance 
matrix,  and  the  many  approximations  which  have  been  proposed  in  order  to 
make  this  method  computationally  tractable  (“Ensemble  Kalman  filter”, 
Evensen,1994;  “Reduced-Rank  Kalman  filter”,  Eisher,  1998;  “The  Cycling 
Representer  algorithm”,  Xu  and  Daley,2000); 

Only  the  4D-Var  approach  has  been  implemented  in  operational  environments 
even  though  many  approximations  are  used  in  practice  (ECMWE;  Rabier  et  ah,  2000). 
The  Kalman  filter  method  is  under  active  study,  but  it  does  not  seem  feasible  to  be 
implemented  in  an  operational  setting  with  the  current  generation  of  computers.  This  is 
due  to  the  fact  that  the  method  requires  an  explicit  computation  of  the  analysis  error 
covariance  matrix  and  its  evolution  in  time  and  this  matrix,  for  current  numerical  weather 
prediction  models,  is  of  the  order  of  lO^xlO’. 

Due  to  the  unavailability  at  CNMCA  of  the  human  and  computer  resources  necessary 
for  tackling  the  problem  of  the  temporal  extension  of  3D-Var,  possible  ways  to 
circumvent  the  main  shortcomings  of  the  algorithm  were  investigated.  The  approach  we 
have  chosen  was  pioneered  by  Benjamin  (1989)  and  it  has  recently  been  actively 
investigated  by  many  researchers  (Miller  and  Benjamin,  1992;  Devenyi  and  Schlatter, 
1994;  Riishojgaard,  1998;  Otte  et  ah,  2000). 

In  this  method  the  isotropic,  stationary  covariance  model  described  in  the  preceding 
paragraphs,  is  modified  through  multiplication  by  an  anisotropic,  flow-dependent  term 
that  is  a  function  of  the  background  field  potential  temperature.  Eor  example,  the 
temperature  autocovariance  function  (2.29)  is  modified  as  follows: 


Cov(n  Tj)  =  ar'  R(x)  x^'^iPuPj)  y(l I ^(npi)  -  ^(r.pj)  U)  (2.95) 

In  the  present  implementation  the  anisotropic  component  v  is  modeled  by  a 
simple  SOAR  function  of  the  absolute  difference  of  the  background  potential 
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temperatures  at  loeations  ij.  This  is  a  eomputationally  eheap  way  of  obtaining  flow- 
dependent  eovarianees,  and  it  relies  heavily  on  the  aeeuraey  of  the  first  guess  potential 
temperature  field  in  order  to  produee  positive  results.  However,  it  might  be  argued  that 
also  in  the  Kalman  filter  approaeh  the  time  evolution  of  the  analysis  error  eovarianee 
matrix  depends  on  the  aeeuraey  of  the  previous  analysis  step  and  of  the  model  itself 
(aetually,  a  linearized  version  of  the  model).  We  note  in  passing  that  the  3-D  character  of 
this  formulation  should  also  be  helpful  in  the  objective  analysis  of  single  level 
observations  in  the  mass-wind  analysis  (SYNOP,  SHIP,  BUOY,  AIREP,  etc),  where  the 
use  of  the  statistically  derived  vertical  correlation  functions  (2.67-2.72)  can  be 
detrimental  when  it  does  not  take  into  account  the  presence  of  sharp  air-mass  boundaries 
(i.e.  boundary  layers  with  sharp  inversions,  tropopause  boundary). 

An  example  of  how  the  flow-dependent  term  impacts  the  objective  analysis  is 
now  discussed.  The  main  feature  of  the  synoptic  state  of  the  atmosphere  over  the  model 
domain  on  the  19*  of  June  2002,  OOUTC,  is  the  elongated  upper  level  trough  whose 
cyclonic  part  extends  over  northwestern  Spain,  the  British  Isles  and  Scandinavia  (Fig. 
2.11).  In  the  lower  troposphere  this  is  mirrored  by  a  sharp  air  mass  boundary  extending 
from  Spain  trough  France,  Germany  and  the  southern  portion  of  the  Scandinavian 
Peninsula  (Fig.  2.12).  In  this  assimilation  cycle  a  radiosonde  near  Paris  reported  a  500 
hPa  temperature  1.5°C  higher  than  the  forecast  temperature.  The  way  in  which  this 
observation  increment  is  spatially  interpolated  in  the  ensuing  objective  analysis  can  be 
seen  in  Fig.  2.13.  The  air  mass  boundary  present  at  500  hPa  (Fig.  2.14)  clearly  models 
the  shape  of  the  correlation  function  in  an  anisotropic  and  flow-dependent  way. 

Although  these  results  look  promising  and  intuitively  appealing,  the  merits  of  the 
method  must  be  evaluated  in  an  objective  way,  through  comparisons  of  statistical  skill 
scores  of  the  forecast  fields  derived  from  objective  analysis  performed  both  with  and 
without  the  flow-dependent  term.  This  work  is  in  progress. 
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Figure  2.11  CNMCA  500  hPa  Geopotential  height  and  Temperature  Analysis: 
June  19*’’  2002,  OOUTC. 


Figure  2.12  CNMCA  850  hPa  Wet  Bulb  Potential  Temperature  Analysis:  June 


19*2002,  OOUTC. 
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ROME  Analysis:  Mercoledi  19  Giugno  2002 
T  SOOhPa  Analysis:  increment  over  First  Guess 


Figure  2.13  CNMCA  500  hPa  Temperature  Analysis  Increment  over  First 
Guess:  June  19*  2002,  OOUTC. 
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Figure  2.14  CNMCA  500  hPa  Potential  Temperature  Analysis:  June  19*  2002, 
OOUTC. 
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III.  BACKGROUND  AND  OBSERVATION  ERRORS 

STATISTICS 


In  order  to  maximize  the  amount  of  information  extracted  from  the  observations 
and,  at  the  same  time,  reject  spurious  noise  (i.e.,  information  related  to  spatial  scales 
which  the  numerical  model  and,  consequently,  the  objective  analysis  algorithm,  cannot 
resolve),  it  is  vital  that  the  background  (P^  )  and  the  observation  (R)  error  covariance 
matrices  be  specified  as  accurately  as  possible. 

Recalling  their  definitions: 

Pb=<ebeJ>  (3.1) 

R=<eoeo^>  (3.2) 

and  the  fact  that  both  errors  Cb  and  Co  refer  to  unknown,  unbiased,  “true”  values,  it  is  clear 
that  we  cannot  compute  these  quantities  from  first  principles.  What  is  done  in  practice  is 
to  estimate  these  covariances  by  a  statistical  procedure:  we  are  assuming  that  these  errors 
are  stationary  in  time  and  uniform  over  our  spatial  domain  in  order  to  derive  their 
“climatology”.  From  this  and  the  underlying  assumptions  that  the  errors  are  normally 
distributed  and  unbiased,  we  are  then  able  to  compute  the  second  moments  (variances  and 
covariances)  of  their  probability  density  functions  (pdf). 

In  practice  this  calculation  can  be  done  in  at  least  three  different  ways: 

1.  The  ^"Observation  method"  (Rutherford,  1972;  Hollingsworth  and  Lonnberg, 
1986).  In  this  procedure  observation-minus-first  guess  differences  are  collected 
for  a  period  of  time  over  a  network  of  homogeneous  and  uncorrelated  observing 
stations.  From  this  data,  which  can  be  further  stratified  by  pressure  level  and  time 
of  the  year,  the  spatial  statistics  of  the  covariances  can  be  computed,  together  with 
the  perceived  background  and  observation  error  variances; 
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2.  The  ‘"NMC  method’’’  (Parrish  and  Berber,  1992).  The  main  assumption  of  the 
method  is  that  the  statistical  structure  of  the  background  correlations  does  not 
change  significantly  over  the  first  48h  of  model  integration.  Then  it  is  possible  to 
derive  the  background  covariances  from  a  statistical  analysis  of  the  differences  of 
the  48h  and  24h  forecast  fields  verifying  at  the  same  time.  The  method  is  suitable 
for  global  numerical  prediction  systems  and  also  very  practical  and 
straightforward  to  set  up.  On  the  other  hand  the  main  hypothesis  it  rests  upon  is 
rather  difficult  to  justify; 

3.  The  ''Analysis-Ensemble  method’’  (Fisher,  2001).  This  method  is  the  one 
currently  used  at  ECMWF.  An  ensemble  of  different  analyses  is  realized  by 
randomly  perturbing  the  initial  observations  within  the  assumed  observation  error. 
These  different  analyses  are  then  integrated  in  time  till  the  next  objective  analysis, 
where  the  process  is  repeated.  After  a  few  days  the  differences  in  the  first  guess 
fields  should  be  representative  of  the  underlying  background  error  statistics. 

The  method  chosen  to  specify  the  parameters  involved  in  the  modeling  of  the 
background  and  observation  error  matrices  is  the  Observation  method.  The  reason  for 
this  is  that  it  is  a  direct  and  theoretically  sound  way  of  deriving  the  background  spatial 
correlations  and,  as  a  bonus,  it  is  also  capable  of  providing  an  estimate  of  the  background 
and  observation  variances,  whose  relative  magnitude  is  perhaps  the  most  fundamental 
quantity  to  be  specified  in  every  objective  analysis  algorithm. 

The  main  steps  in  the  computations  and  the  more  relevant  results  will  be 
described  in  the  following  paragraphs.  For  more  details,  see  Vocino  (2002).  Also  an 
account  of  the  observations  currently  used  and  their  assumed  error  statistics  will  be  given. 


A.  BACKGROUND  ERROR  STATISTICS 

In  order  to  derive  the  necessary  statistics  on  the  background  error,  the 
"Observation  method'  has  been  used.  The  entire  network  of  land  radiosonde  stations 
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present  in  the  analysis  domain  (165)  has  been  exploited  in  the  eomputations:  the  data 
refer  to  the  OOZ  and  12Z  synoptie  times  over  a  three  month  period  starting  from  the  13*’’ 
of  March  2002.  The  temperature  6-h  observation  increments  (denoted  as  Ok-Bk)  were 
collected  for  the  standard  pressure  levels  (1000  hPa,  925  hPa,  850  hPa,  700  hPa,  500  hPa, 
400  hPa,  300  hPa,  200  hPa,  100  hPa,  50  hPa). 

After  removing  the  bias  for  each  station,  the  following  estimate  of  observation 
increments  correlations  was  computed  for  each  pair  of  stations  k,h 

Rm  =  <(Ok-Bk)  (Oi-Bi)>  /(<(Ok-Bk)^>‘^'<  (Oi-Bif>^^^  (  3.1) 


Due  to  the  separability  assumption,  we  can  study  the  quasi-horizontal  (isobaric) 
correlations  independently  from  the  vertical  correlations. 

In  order  to  study  the  isotropic  component  of  the  isobaric  correlations  Rm,  they 
have  been  partitioned  into  intervals  of  0.01  rad  (~200  Km)  of  their  mutual  great  circle 
distance.  In  each  of  these  intervals  the  Fisher  z-transform  of  the  empirical  correlation 
coefficients  has  been  performed: 


Z  =  l/2log((l+R)/(l-R)) 


(3.2) 


This  has  been  done  in  order  to  preserve  the  assumed  normal  distribution  of  the 
correlation  coefficients  around  their  mean  value,  and  it  has  proved  beneficial  in  the 
successive  fit  of  the  correlation  models  to  the  empirical  data  (Fig  3.1). 
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Figure  3.1  Example  of  correlation  fit  -with  arithmetical  and  Fisher  z-transform 
mean. 


The  model  functions  fitted  to  the  empirical  correlation  coefficients  have  been  the 
Negative  Squared  Exponential  (NSE)  function: 

Pb=  exp(-0.5(r/Lc)  (3.3) 

and  the  Second  Order  Autoregressive  (SOAR)  function: 

Pb=  (1  +r/Lc)exp-(r/Lc)  (3.4) 

What  -we  are  trying  to  determine  is  the  correlation  length  Lc,  'which  can  be  defined 
for  any  correlation  model  as  (Daley,  1991): 

L,^(-2p/V^pf\=o  (3.5) 
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where  the  Laplaeian  operator  due  to  the  isotropy  assumption,  reduces  to: 


S^=l/r  d/dr(rd/dr)  (3.6) 

Sample  results  of  the  fits  are  given  below  (distances  are  in  radians  units  of  Earth’s 

radius). 


T-T  Correlation  Function  at  0300  hPa 


Figure  3.2  Correlation  fit  for  isobaric  (300  hPa)  Temperature  Observation 
increments. 
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T-T  Correlation  Function  at  0500  hPa 
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Fig.  3.3:  Correlation  fit  for  isobaric  (500  hPa)  Temperature  Observation  increments. 


T-T  Correlation  Function  at  0850  hPa 


Fig.  3.4:  Correlation  fit  for  isobaric  (850  hPa)  Temperature  Observation  increments. 
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The  main  quantitative  results  are  summarized  in  the  following  two  tables: 


Pressure  Level  (hPa) 

Lc  (Km) 

Fit  rmse 

1000 

713 

0.0618 

925 

339 

0.0321 

850 

426 

0.0362 

700 

378 

0.0244 

500 

409 

0.0304 

400 

368 

0.0250 

300 

517 

0.0441 

200 

539 

0.0214 

100 

893 

0.0488 

50 

809 

0.0421 

Table  3.1  Correlation  lengths  and  rmse  of  fit  for  Temperature  Observation 
inerements:  NSE  eorrelation  model. 
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Pressure  Level  (hPa) 

Lc  (Km) 

Fit  rmse 

1000 

417 

0.0560 

925 

201 

0.0259 

850 

261 

0.0291 

700 

221 

0.0186 

500 

242 

0.0251 

400 

220 

0.0195 

300 

296 

0.0443 

200 

318 

0.0251 

100 

544 

0.0329 

50 

482 

0.0281 

Table  3.2  Correlation  lengths  and  rmse  of  fit  for  Temperature  Observation 
inerements:  SOAR  correlation  model. 


From  inspection  it  is  clear  that  the  SOAR  function  gives  a  better  fit  to  the  empirical 
correlations  at  almost  all  levels:  this  is  mainly  due  to  the  better  agreement  of  the  SOAR 
model  function  with  the  data  at  intermediate  distances  (0.04-0.08  radians),  where  the 
NSE  function  is  seen  to  fall  off  too  steeply.  The  correlation  lengths  have  been  plotted  for 
ease  of  reference  in  Fig.  3.5. 
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Vertical  Profile  of  T-T  Correlation  Length  Scale  Parameter 


Figure  3.5  Vertical  profiles  of  Correlation  Lengths  for  Temperature  Observation 
increments. 


Apart  from  the  anomalously  high  value  at  1000  hPa  (whose  cause  is  under 
investigation),  the  correlation  lengths  appear  to  have  a  plausible  height  profile.  They  are 
approximately  constant  over  much  of  the  troposphere  and  increase  considerably  in  the 
stratosphere.  The  lower  correlation  length  values  shown  by  the  SOAR  model  agree  with 
the  more  gradual  incline  of  the  function  at  large  distances,  where  the  linear  term  becomes 
dominant. 

For  the  experimental  runs  of  the  objective  analysis  procedure,  the  SOAR  model 
has  been  selected,  with  a  constant  tropospheric  correlation  length  of  Lc  =  250  Km  and  a 
constant  stratospheric  (i.e. /f/ev^250  hPa)  correlation  length  of  Lc  =  400  Km. 

Another  appealing  feature  of  the  "^Observation  method"  is  the  possibility  of  deriving 
objective  estimates  of  the  background  and  observation  errors’  variances,  which  arguably 
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are  the  most  important  parameters  to  be  speeified  in  an  objeetive  assimilation  algorithm. 
How  this  is  done  follows  from  the  definition  of  the  eorrelation  eoeffieient  Rm  (3.1): 
assuming  homogeneity  of  the  errors  and  uneorrelated  observation  errors  (as  is  the  case 
for  independent  radiosonde  measurements),  it  can  be  shown  (Daley,  1991)  that: 


Rz  =limr^Rki(r)  =  Eb^/(Eb^+Eo^)  (3.7) 

where  Eb  ,Eo  are  the  background  field  and  observation  variances  respectively.  This  is 

what  can  intuitively  be  expected,  since  the  observations  are  mutually  uneorrelated:  for 

distances  r  close,  but  not  equal  to  0,  the  correlation  can  only  be  explained  by  the 

background  term  contribution,  so  that  extrapolating  to  zero  distance  gives  the  relative 
weight  of  the  background  term  with  respective  to  the  total  (background  +  observation) 
variance. 

On  the  other  hand  the  total  variance  of  the  observation  increments  can  be  easily 
computed  from: 

W  =  1/K  Ek<(Ok-Bk)^>  (3.8) 

so  that,  using  (3. 7-3. 8)  each  variance  can  be  calculated.  The  results  of  these  computations 
have  been  summarized  in  Tables  3.3  and  3.4. 
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Pressure  Level 
(hPa) 

EU°C) 

Eo(°C) 

E,(°C) 

1000 

1.52 

2.19 

2.67 

925 

1.74 

1.19 

2.36 

850 

1.26 

1.31 

1.82 

700 

1.24 

1.10 

1.66 

500 

1.46 

1.58 

2.16 

400 

1.56 

1.64 

2.26 

300 

2.24 

1.37 

2.63 

200 

2.72 

0.94 

2.88 

100 

1.15 

1.16 

1.63 

50 

1.25 

1.32 

1.82 

Table  3.3  Background  error,  Observation  error  and  Total  perceived  error  for 
Temperature  Observation  increments:  NSE  correlation  model. 


Pressure  Level 
(hPa) 

EU°C) 

Eo(°C) 

Et(°C) 

1000 

1.62 

2.12 

2.67 

925 

1.86 

1.45 

2.36 

850 

1.33 

1.24 

1.82 

700 

1.32 

1.00 

1.66 

500 

1.56 

1.49 

2.16 

400 

1.66 

1.54 

2.26 

300 

2.39 

1.08 

2.63 

200 

2.87 

0.21 

2.88 

100 

1.20 

1.10 

1.63 

50 

1.32 

1.25 

1.82 

Table  3.4  Background  error,  Observation  error  and  Total  perceived  error  for 
Temperature  Observation  increments:  SOAR  correlation  model. 
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pressure  (hPa) 


Also  these  values  have  been  plotted  for  ease  of  reference  in  Fig.3.6  and  3.7. 


Vertical  Profile  of  T  Root  Mean  Square  Errors 


Figure  3.6  Vertical  profile  of  Background  error,  Observation  error  and  Total 
perceived  error  for  Temperature  Observation  increments:  NSE 
correlation  model. 
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Vertical  Profile  of  T  Root  Mean  Square  Errors 


Figure  3.7  Vertical  profile  of  Background  error,  Observation  error  and  Total 
perceived  error  for  Temperature  Observation  increments:  SOAR 
correlation  model. 

From  the  plots  it  is  clear  that,  with  the  exception  of  the  200  hPa  level,  the 
background  and  observed  root  mean  square  errors  are  fairly  constant  with  height  and  of 
comparable  magnitude,  in  the  range  of  1.2-1.5°C:  these  values  are  compatible  with  the 
expected  accuracy  of  radiosonde  observations  and  6-h  forecast  fields. 

The  statistical  calculation  of  the  parameters  in  the  vertical  component  has  been 
performed  in  analogous  fashion.  Since,  from  inspection  of  2.67-70,  2.78-80  and  2.85-86, 
it  is  clear  that  all  vertical  correlations  can  be  expressed  in  term  of  X^(Pi’Pj)  =  x"‘(PiPj), 
the  observed  vertical  correlations  of  the  geopotential  background  increments  have  been 
fitted  to  the  model  function  (1  +kplog^(pi/pj)X^  (Fig. 3. 8). 
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Fit  of  Height  Background  Error  Vertbai  Correiation 


Fig.  3.8  ;  Fit  of  vertical  profile  of  geopotential  height  background  minus 
observation  increments:  equation  (2.68)  correlation  model. 


The  fit  is  satisfactory  (RMSE  =  0.101),  giving  for  the  fitting  parameter  kp=  1.77, 
which  is  the  value  used  in  the  experimental  trials  of  the  objective  analysis  scheme. 


B.  OBSERVATION  ERROR  STATISTICS 

Since  the  new  objective  analysis  scheme  is  still  in  an  experimental  stage  and  it 
was  felt  it  was  important  to  validate  the  merits  of  the  new  algorithm  and  the  revised 
background  statistics  and  correlation  functions,  the  observations  used  are  the  same 
ones  used  in  the  current  operational  Optimum  Interpolation  analysis.  A  brief  account 
of  their  error  statistics  will  be  given  below. 
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1.  Radiosondes  and  Pibals 

Even  today,  radiosonde  reports  (TEMP)  are  the  main  eomponent  of  the  global 
weather  observing  system  in  the  Northern  Hemisphere.  On  the  main  synoptic  hours  (00 
and  12  EITC)  about  165  reports  are  routinely  analyzed  (around  80  reports  at  06  and  18 
UTC  synoptic  hours).  Mandatory  and  significant  level  observations  of  temperature, 
relative  humidity  and  wind  are  assimilated.  Observations  at  levels  not  coincident  with 
isobaric  analysis  levels  are  vertically  (linearly  in  log(p))  interpolated.  Erom  the  results 
shown  in  the  previous  section,  the  assumed  accuracy  (RMSE)  of  the  temperature 
observations  have  been  set  to  1.2°C;  no  radiative  corrections  are  employed.  Eor  relative 
humidity  the  assumed  accuracy  has  been  set  to  5%.  Perceived  winds  accuracy  varies  from 
2.1ms'^  in  the  lower  levels  to  3.5  ms'^  in  the  stratosphere. 

A  major  advantage  of  the  new  objective  analysis  is  that  it  employs  temperatures 
instead  of  geopotential  heights,  as  the  previous  one  did.  This  brings  about  two  positive 
features:  the  temperature  observation  errors  are  much  less  vertically  correlated  than  the 
geopotential  errors  and  the  background  error  correlations  for  temperature  are  sharper  than 
the  corresponding  geopotential  correlations.  The  end  result  of  these  two  factors  is  that 
higher  vertical  resolution  analysis  is  possible.  Also,  on  a  more  technical  level,  making  the 
assumption  of  vertically  uncorrelated  temperature  observation  errors  drastically  improves 
the  condition  number  of  the  analysis  equations,  thus  greatly  reducing  the  computation 
time  required  for  the  minimization  of  the  cost  function.  Pilot  weather  balloons  (Pibals) 
wind  reports  are  also  routinely  analyzed,  with  expected  observation  errors  somewhat 
greater  than  those  assigned  to  radiosonde  reports. 

2.  Aircraft  Based  Observations 

Aircraft  based  temperature  and  wind  observations  are  routinely  assimilated,  after 
they  have  undergone  the  preprocessing  step  of  the  analysis  cycle,  where  consistency 
checks  on  the  moving  platform  reported  position  and  climatological  gross  limit  checks 
are  performed. 

Automatic  observations  (AMDAR,  Aircraft  Meteorological  Data  Relay)  are  given 
greater  weight  (i.e.,  smaller  observation  RMSEs)  over  manual  observations  (AIREP, 
Aircraft  Reports).  Both  are  treated  as  uncorrelated,  single  level  reports. 
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3.  Atmospheric  Motion  Winds 

Atmospheric  motion  winds  are  derived  from  tracking  the  movement  of  identified 
cloud  structures  from  a  sequence  (i.e.  three)  of  geostationary  satellite  images.  These 
observations  are  not  routinely  analyzed  in  the  current  operational  objective  analysis,  but 
their  use  is  envisaged  in  the  new  data  assimilation  system.  It  is  felt,  however,  that,  due  to 
the  current  uncertainties  in  the  height  assignment  of  the  winds,  research  is  still  needed  in 
order  to  design  an  observation  operator  H  able  to  extract  all  the  relevant  information. 

4.  Conventional  Surface  Observations 

Conventional  surface  observations,  both  over  land  (SYNOP)  and  over  the  sea 
(SHIP,  BUOYS)  are  routinely  analyzed.  After  preprocessing  and  superobbing  the  reports 
closer  than  the  average  grid  spacing  (~55  Km),  around  1200  observations  are  fed  into 
objective  analysis  algorithm.  For  the  upper  air  analysis  surface  pressure  innovations  are 
transformed  into  geopotential  innovations  of  the  closest  analysis  level  and  their  impact  on 
the  analyzed  variables  {T,u,v)  is  evaluated  through  the  use  of  the  correlations  2.78-2.90. 

Surface  fields  (Surface  pressure.  Mean  Sea  Level  pressure,  10-m  wind)  are 
analyzed  through  a  two-dimensional,  multivariate  version  of  the  algorithm  (see  Section 
II. C. 2).  Winds  are  only  assimilated  over  sea.  At  the  moment  geostrophy  is  not  strictly 
enforced,  using  a  geostrophic  coupling  parameter  ju=0.7.  However  it  is  felt  that  a  more 
accurate  estimate  of  the  geostrophic,  balanced  component  of  the  observed  winds  is 
needed  in  order  to  make  good  use  of  the  wind  observations.  Ways  to  accomplish  this  by 
taking  into  account  the  structure  of  the  forecast  marine  boundary  layer  are  currently  being 
investigated. 

The  Mean  Sea  Level  pressure  (MSLP)  is  well  defined  over  the  sea  but,  over  land, 
is  more  of  a  visual  aid  for  forecasters  than  a  real  meteorological  field.  However  it  is  felt 
to  be  important  that  the  analyzed  MSLP  field  closely  draws  to  the  accepted  MSLP  and 
surface  wind  observations.  To  accomplish  this,  it  has  been  found  necessary  to  objectively 
analyze  the  MSLP  and  surface  wind  departures.  Reduction  of  surface  pressure  (SP) 
analysis  to  mean  sea  level  did  not  give  satisfactory  results,  mainly  because  of  the 
differences  between  model  temperatures  and  observing  station  temperatures. 
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5.  Scatterometer  Winds 

Scatterometers  are  satellite-borne  radars,  whieh  provide  measurement  of  surfaee 
wind  speed  and  direetion  over  sea.  There  is  ambiguity,  however,  in  the  wind  direetions, 
and  four  different  wind  direetions  are  eompatible  with  eaeh  measurement.  This  ambiguity 
ean  be  resolved  either  by  ehoosing  the  direetion  eloser  to  the  first  guess  or  by  more 
advaneed  methods  that  take  into  aeeount  the  spatial  coherence  properties  of  the  retrieved 
wind  field. 

At  the  moment  work  is  under  way  to  assimilate  the  wind  field  product  generated 
by  the  Dutch  Meteorological  Institute  (KNMI)  from  the  QuikSCAT  satellite  observations 
(SeaWinds).  This  is  “...a  near  real-time  100-km  resolution  QuikSCAT  product,  which 
includes  inversion.  Quality  Control,  and  a  2D-Var  ambiguity  removal  algorithm...” 
making  it  suitable  for  data  assimilation  purposes  (for  more  information,  see 
www.knmi.nl/onderzk/applied/scattmtr/quikscat/index.html). 
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IV.  VALIDATION  AND  PRELIMINARY  RESULTS 


There  are  many  possible  ways  through  whieh  the  quality  of  the  objectively 
analyzed  fields  can  be  gauged; 

1.  A  subjective,  “synoptic”  evaluation  of  the  charts  produced  by  the  data 
assimilation  cycle  from  a  forecaster’s  perspective,  assessing  the  adherence  to  the 
observations  (especially  those  not  used  in  the  analysis  scheme,  such  as  satellite 
imagery)  and  the  degree  of  enforcement  of  “physical”  balance  properties; 

2.  Use  of  internal  diagnostics,  such  as  the  analysis  error  covariance  matrix,  which 
can  be  computed  as 

Pa  =  (Pb‘+H^R‘H)‘  (4.1) 

It  is  common  to  compute  only  the  diagonal  elements  of  the  matrix,  which  give  an 
estimate  of  the  variances  of  the  analyzed  variables.  Unfortunately  eq.  (4.1)  strictly 
holds  on  condition  that  the  background  and  observation  error  covariance  matrices 
{Pb,R)  have  been  correctly  specified,  which  is  not  usually  the  case.  If  this  is  not 
true,  then  eq.(4.I)  underestimates  the  real  analysis  error  covariances 

3.  A  statistical,  “objective”  verification  through  comparison  of  forecasts  produced 
from  the  analyzed  fields  with  other  forecasts  started  from  independent  data 
assimilation  cycles. 

This  later  approach  has  been  taken  in  this  study.  Details  of  its  implementation  will 
be  given  below,  together  with  a  discussion  of  the  results  so  far  obtained. 
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A.  METHOD  OF  VERIFICATION 

In  order  to  assess  the  quality  of  the  analysis  fields  in  an  objeetive  manner,  two 
parallel  runs  of  the  HRM  model  (see  Section  I.A.3)  have  been  set  up,  running  at  OOUTC 
every  day  up  to  T+48h:  one  starting  from  the  analysis  fields  of  the  new  data  assimilation 
cycle,  the  other  from  the  analyzed  ECMWF  fields  interpolated  to  the  HRM  model  grid. 
Apart  from  the  initial  conditions,  all  the  other  features  of  the  two  model  integrations  are 
equal  (boundary  conditions,  resolution,  etc.).  This  should  guarantee  that  any  difference  in 
the  subsequent  forecast  fields  should  be  traced  back  to  differences  in  the  initial 
conditions.  The  accuracy  of  the  forecast  fields  is  estimated  through  the  use  of  two 
common  scalar  measures  of  gridded  fields:  the  root  mean  square  error  (RMSE)  and  the 
anomaly  correlation  (AC). 

Denoting  by/y  the  grid  values  of  the  forecast  field  and  ay  the  grid  values  of  the 
verifying  analysis,  then  the  RMSE  is  given  by  (Wilkes,  1995): 

RMSE  =  (l/(MN)Ei=j_M^j=j_N(fy  -  ay)f^  (4.1) 

This  skill  score  is  what  we  might  define  as  an  absolute  measure  of  the  forecast’s 
quality.  However  it  is  not  able  to  distinguish  if  the  errors  are  related  to  biases  in  the 
forecast  fields  or  to  the  misplacement  of  significant  weather  patterns.  In  order  to  do  this 
the  anomaly  correlation  score  is  useful  (Wilkes,  1995).  If  we  define  Cy  the  climatological 
averages  of  the  analyzed  fields  at  each  grid  point,  then: 

AC  =  (Ei=,,M^j=J.N(fii-  Cy)  (ay-  Cy))/((  E  -  Cif  (E=j,MEj=J,N(ay  -  Ci/)) 

(4.2) 

From  this  expression  it  is  clear  that  we  are  investigating  the  correlations  between  the 
anomalies  with  respect  to  climatology  of  the  forecast  and  the  analyzed  fields.  In  this  way 
we  are  highlighting  the  pattern  similarities  between  the  two  fields,  while  giving  less 
weight  to  their  absolute  values. 

A  final  point  to  be  made  is  that  the  verifying  analyses  are  the  ECMWF  analyses 
interpolated  on  to  the  HRM  grid.  This  choice  has  been  made  in  order  to  test  the  quality  of 
the  forecast  fields  derived  from  the  data  assimilation  cycle  under  their  most  unfavorable 
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conditions.  The  statistical  errors  so  derived  should  then  represent  an  upper  limit  to  the 
expected  forecast  errors. 


B.  PRELIMINARY  RESULTS 

Some  results  from  a  statistical  comparison  of  the  model  runs  over  30  cases  (15* 
June  2002  -  15*  July  2002)  are  shown  in  Fig.4.1  through  4.8  for  the  mean  sea  level 
pressure,  geopotential  height  and  wind  speed  fields.  From  inspection  of  the  charts,  it  is 
evident  that: 


1.  Degradation  in  quality  with  respect  to  the  ECMWF  based  model  run  is  within 
acceptable  limits  (~  0.7  hPa  for  MSLP,  from  5  to  10  gpm  for  geopotential  height, 
from  0.9  m/s  at  850  hPa  to  2.3  m/s  at  the  jet  level  height  for  wind  speed); 

2.  The  anomaly  correlation  scores  for  both  the  MSLP  and  500  hPa  geopotential 
height  field  are  very  close:  this  is  an  indication  that  the  location  of  weather  system 
is  correctly  placed  in  the  CNMCA  analysis’  derived  forecast  fields  and  that  the 
main  source  of  error  is  to  be  found  in  the  diagnosed  intensities; 

3.  The  error  evolution  is  smooth  in  time,  indicating  that  the  analysis  fields  are 
perturbing  the  first  guess  fields  in  a  “balanced”  and  physically  reasonable  manner. 
The  gradual  decrease  in  the  RMSE  differences  between  the  two  model  runs  is  to 
due  to  the  steady  increase  of  influence  of  the  common  ECMWE  derived  boundary 
conditions. 


It  should  also  be  borne  in  mind  the  different  spatial  and  vertical  resolution  of  the 
ECMWE  analyses  (which  are  the  results  of  4D-Var  minimizations  at  half  the  operational 
model  resolution)  with  respect  to  the  CNMCA  analyses.  The  RMSE  scores  thus 
incorporate  an  intrinsic  error  of  representativeness  whose  magnitude  has  not  yet  been 
determined. 
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MSLP  RMSE 


CNMCA_ANA  — ECMWF_ANA 


Figure  4.1  RMSE  of  Mean  Sea  level  pressure  forecast  fields  vs.  ECMWF 
analysis. 


Figure  4.2  RMSE  of  850  hPa  geopotential  height  forecast  fields  vs.  ECMWF 
analysis. 
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Z500  hPa  RMSE 

— CNMCA_ANA  — ECMWF_ANA 


Figure  4.3  RMSE  of  500  hPa  geopotential  height  forecast  fields  vs.  ECMWF 
analysis. 


Z300  hPa  RMSE 

— CNMCA_ANA  — ECMWF_ANA 


Forecast  time 


Figure  4.4  RMSE  of  300  hPa  geopotential  height  forecast  fields  vs.  ECMWF 
analysis. 
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U850  hPa  RMSE 


CNMCA_ANA  — ECMWF_ANA 


Figure  4.5  RMSE  of  850  hPa  wind  speed  forecast  fields  vs.  ECMWF  analysis. 


U500  hPa  RMSE 


CNMCA_ANA  — ECMWF_ANA 


Figure  4.6  RMSE  of  500  hPa  wind  speed  forecast  fields  vs.  ECMWF  analysis. 
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U300  hPa  RMSE 


— CNMCA_ANA  — ECMWF_ANA 


Figure  4.7  RMSE  of  300  hPa  wind  speed  forecast  fields  vs.  ECMWF  analysis. 


MSLP  Anom.  Corr. 


CNMCA_ANA  — ECMWF_ANA 


Figure  4.8  Anomaly  Correlation  of  MSEP  forecast  fields  vs.  ECMWE  analysis. 


63 


Z  500  hPa  Anom.  Corr. 


CNMCA_ANA  — ECMWF_ANA 


Figure  4.9  Anomaly  Correlation  of  500  hPa  Geopotential  height  forecast  fields  vs. 
ECMWF  analysis. 
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V.  CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER 

DEVELOPMENT 


In  this  work  the  main  ideas  behind  the  new  data  assimilation  eyele  being 
implemented  at  Italian  Air  Foree  Weather  Serviee  have  been  presented.  The  system  is  in 
a  relatively  early  stage  of  development  and  many  important  eomponents  are  still  missing 
in  order  to  make  it  truly  effeetive  and  operationally  viable.  However  it  is  felt  that  the 
basie  building  bloeks  of  the  system  have  been  set  up: 

1 .  The  pre-proeessing  and  quality  eontrol  of  the  observations; 

2.  The  new  3-D  eovarianee  model  in  spherieal  eoordinates; 

3.  The  deseent  algorithm  for  the  minimization  of  the  eost  funetion; 

4.  The  statistieal  evaluation  of  the  baekground  error  eovarianees. 

At  this  point,  the  development  effort  will  eoneentrate  mainly  on  the  following 
areas,  whieh  are  deemed  to  be  the  most  urgent  requirements  for  an  operational  use  of  the 
new  system: 

1.  The  implementation  of  an  effeetive  buddy  eheek  algorithm  for  the  sereening  of 
marginal  observations.  Given  the  global  nature  of  the  analysis  seheme,  a  loeal 
method  sueh  as  that  first  proposed  by  Lorene  (1981)  would  not  be  appropriate. 
Approximations  to  modern  methods  following  Daley  and  Barker  (2000,  Seetion 
9.3)  are  being  investigated. 

2.  The  implementation  of  an  effeetive  and  easy  to  eompute  preeonditioner  for  the 
minimization  of  the  eost  funetion  is  also  being  pursued.  Currently  the  time 
required  for  the  minimization  algorithm  is  within  aeeeptable  limits,  but  it  is 
expeeted  that  the  inelusion  of  a  larger  number  of  satellite  observations  will  ehange 
this. 
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3.  The  extension  of  the  algorithm  to  observations  that  are  not  linearly  related  to  the 
state  variables.  This  work  is  well  under  way  for  seatterometer  winds  and  will  be 
started  for  column  precipitable  water  observations  and  satellite  radiances.  The 
direct  assimilation  of  satellite  radiances  is  a  long  term  goal  whose  main  value  is 
not  expected  to  be  so  much  in  improving  the  current  analyses  over  the  present 
analysis  domain  as  in  the  experience  to  be  gained  in  view  of  the  future  availability 
of  much  improved  observations  from  satellite  hyperspectral  sounders. 

The  results  shown  in  the  previous  section  indicate  that  the  skill  of  the  forecast 
fields  derived  from  the  new  assimilation  cycle  still  lags  the  skill  of  ECMWF  derived 
forecasts  by  18-24  h  when  verified  with  respect  to  ECMWF  analyses.  As  mentioned  in  an 
earlier  section,  part  of  this  is  due  to  representativeness  error.  It  is  felt  that  this  error  can  be 
reduced  within  6-12h  when  more  observations  will  have  been  added  and  some  parameter 
optimizations  will  have  been  performed. 
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